Representative Selection in Nonmetric Datasets
نویسندگان
چکیده
This paper considers the problem of representative selection: choosing a subset of data points from a dataset that best represents its overall set of elements. This subset needs to inherently reflect the type of information contained in the entire set, while minimizing redundancy. For such purposes, clustering may seem like a natural approach. However, existing clustering methods are not ideally suited for representative selection, especially when dealing with non-metric data, where only a pairwise similarity measure exists. In this paper we propose δmedoids, a novel approach that can be viewed as an extension to the k-medoids algorithm and is specifically suited for sample representative selection from non-metric data. We empirically validate δ-medoids in two domains, namely music analysis and motion analysis. We also show some theoretical bounds on the performance of δ-medoids and the hardness of representative selection in general.
منابع مشابه
Nonmetric multidimensional scaling: Neural networks versus traditional techniques
In this paper we consider various methods for nonmetric multidimensional scaling. We focus on the nonmetric phase, for which we consider various alternatives: Kruskal’s nonmetric phase, Guttman’s nonmetric phase, monotone regression by monotone splines, and monotone regression by a monotone neural network. All methods are briefly described. We use sequential quadratic programming to estimate th...
متن کاملFast SFFS-Based Algorithm for Feature Selection in Biomedical Datasets
Biomedical datasets usually include a large number of features relative to the number of samples. However, some data dimensions may be less relevant or even irrelevant to the output class. Selection of an optimal subset of features is critical, not only to reduce the processing cost but also to improve the classification results. To this end, this paper presents a hybrid method of filter and wr...
متن کاملSFLA Based Gene Selection Approach for Improving Cancer Classification Accuracy
In this paper, we propose a new gene selection algorithm based on Shuffled Frog Leaping Algorithm that is called SFLA-FS. The proposed algorithm is used for improving cancer classification accuracy. Most of the biological datasets such as cancer datasets have a large number of genes and few samples. However, most of these genes are not usable in some tasks for example in cancer classification....
متن کاملA hybrid filter-based feature selection method via hesitant fuzzy and rough sets concepts
High dimensional microarray datasets are difficult to classify since they have many features with small number ofinstances and imbalanced distribution of classes. This paper proposes a filter-based feature selection method to improvethe classification performance of microarray datasets by selecting the significant features. Combining the concepts ofrough sets, weighted rough set, fuzzy rough se...
متن کاملFeature selection using genetic algorithm for breast cancer diagnosis: experiment on three different datasets
Objective(s): This study addresses feature selection for breast cancer diagnosis. The present process uses a wrapper approach using GA-based on feature selection and PS-classifier. The results of experiment show that the proposed model is comparable to the other models on Wisconsin breast cancer datasets. Materials and Methods: To evaluate effectiveness of proposed feature selection method, we ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Applied Artificial Intelligence
دوره 29 شماره
صفحات -
تاریخ انتشار 2015